Lecture 3

Temporal Models and Smoothing

Sara Martino

Dept. of Mathematical Science, NTNU

Janine Illian

University of Glasgow

Motivation

Data are often observed in time, and time dependence is often expected.

  • Observations are correlated in time
  • We also have correlations between the time series (will look at the later…)

Motivation

  1. Smoothing of the time effect

What is our goal?

  1. Smoothing of the time effect

Note: We can use the same model to smooth covariate effects!

What is our goal?

  1. Smoothing of the time effect

  2. Prediction

We can “predict” any unobserved data, does not have to be in the future

Modeling time with INLA

Time can be indexed over a

  • Discrete domain (e.g., years)

  • Continuous domain

Modeling time with INLA

Time can be indexed over a

  • Discrete domain (e.g., years)

    • Main models: RW1, RW2 and AR1

    • Note: RW1 and RW2 are also used for smoothing covariates

  • Continuous domain

    • Here we use the so-called SPDE-approach

Discrete time modelling

Example - (log) Number of Air Passengers in time

Goal we want understand the pattern and predict into the future

Random Walk models

Random walk models encourage the mean of the linear predictor to vary gradually over time.

They do this by assuming that, on average, the time effect at each point is the mean of the effect at the neighboring points.

  • Random Walk of order 1 (RW1) we take the two nearest neighbors

  • Random Walk of order 2 (RW2) we take the four nearest neighbors

Random walks of order 1

Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)

Definition

\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \] where the precision is \(\mathbf{Q} = \tau\mathbf{R}\) with

\[ \mathbf{R} = \begin{bmatrix} 1 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -1 & 2 & -1 \\ & & & & -1 & 1 \end{bmatrix} \]

Random walks of order 1

Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)

Definition

\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]

  1. Role of the precision parameter \(\tau\) and prior distribution
  2. RW as intrinsic model

What is the role of the precision parameter?

  • \(\tau\) says how much \(u_t\) can vary around its mean

    • Small \(\tau\) \(\rightarrow\) large variation \(\rightarrow\) less smooth effect
    • Large \(\tau\) \(\rightarrow\) small variation \(\rightarrow\) smoother effect

We need to set a prior distribution for \(\tau\).

A common option is the so called PC-priors

Penalized Complexity (PC) priors

  • PC prior are easily available in inlabru for many model parameters
  • They are build with two principle in mind:

    1. The prior discourage overdispersion by penalizing distance from a base model

  • A line is the base model

  • We want to penalize more complex models

Penalized Complexity (PC) priors

  • PC prior are easily available in inlabru for many model parameters

  • They are build with two principle in mind:

    1. The prior discourage overdispersion by penalizing distance from a base model
    2. User-defined scaling

\[ \text{Prob}\left(\sqrt{\frac{1}{\tau}}>U\right) = \text{Prob}(\sigma>U) = \alpha; \qquad U>0, \ \alpha \in (0,1) \]

  • \(U\) an upper limit for the standard deviation and \(\alpha\) a small probability.

  • \(U\) a likely value for the standard deviation and \(\alpha=0.5\).

Example

The Model \[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + f(t_i)\\ f(t_1),f(t_2),\dots,f(t_n) &\sim \text{RW2}(\tau) \end{aligned} \]

The code

cmp = ~ Intercept(1) + 
  time(year, model = "rw1",
       hyper = list(prec = 
                      list(prior = "pc.prec",
                           param = c(0.5, 0.5))))

RW as intrinsic models

RW1 defines differences, not absolute levels:

  • Only the changes between neighboring terms are modeled.

  • The model has no information about the global mean (intercept).

  • Mathematically, \[ (u_1,\dots,u_n)\text{ and }(u_1+a,\dots,u_n+a) \] produce identical likelihoods — they’re indistinguishable.

This means:

  • The precision matrix \(\mathbf{Q}\) is singular.

  • Posterior inference is not well-defined unless we fix the overall level.

Solution:

  • Sum to zero constraint \(\sum_{i = 1}^n u_i = 0\)
  • This is be default included in the model

Random walks of order 2

  • Just like RW1, but now we consider 4 neighbours instead of 2 \[ u_t = \text{mean}(u_{t-2} ,u_{t-1} , u_{t+1}, u_{t+2} ) + \text{some Gaussian error with precision } \tau \]

  • RW2 are smoother than RW1

  • The precision has the same role as for RW1

Example

cmp1 = ~ Intercept(1) + 
  time(year, model = "rw1", 
       scale.model = T,
       hyper = list(prec = 
                      list(prior = "pc.prec",
                           param = c(0.3,0.5))))

cmp2 = ~ Intercept(1) + 
  time(year, model = "rw2",
       scale.model = T,
       hyper = list(prec = 
                      list(prior = "pc.prec", 
                           param = c(0.3,0.5))))


lik = bru_obs(formula = Erie~ ., 
              data = lakes)

fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)

NOTE: the scale.model = TRUE option scales the \(\mathbf{Q}\) matrix so the precision parameter has the same interpretation in both models.

Summary RW (1 and 2) models

  • Latent effects suitable for smoothing and modeling temporal data.

  • Has one hyperparameter: the precision \(\tau\)

    • Use PC prior for \(\tau\)
  • It is an intrinsic model

    • The precision matrix \(\mathbf{Q}\) is rank deficient

    • A sum-to-zero constraint is added to make the model identifiable!

  • RW2 models are smoother than RW1

Auto Regressive Models of order 1 (AR1)

Definition

\[ u_t = \phi u_{t-i} + \epsilon_t; \qquad \phi\in(-1,1), \ \epsilon_t\sim\mathcal{N}(0,\tau^{-1}) \] \[ \pi(\mathbf{u}|\tau)\propto\exp\left(-\frac{\tau}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right) \] with \[ \mathbf{Q} = \begin{bmatrix} 1 & -\phi & & & & \\ -\phi & (1+\phi^2) & -\phi & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\phi & (1+\phi^2) & -\phi \\ & & & & -\phi & 1 \end{bmatrix} \]

AR1: Hyperparameters and prior

The AR1 model has two parameters

  • The precision \(\tau\)
  • The autocorrelation (or persistence) parameter \(\phi\in(0,1)\)

AR1: Hyperparameters and prior

The AR1 model has two parameters

  • The precision \(\tau\)
    • PC prior as before \[ \text{Prob}(\sigma > u) = \alpha \]
  • The autocorrelation (or persistence) parameter \(\phi\in(0,1)\)